The present disclosure relates to portable document format (PDF) extraction (also referred to as “PDF extraction program”), and more specifically, the present disclosure relates to information extraction from a standardized PDF report in a non-paragraph format.
A PDF is based on PostScript language and captures a complete description of a fixed-layout flat document. A fixed-layout flat document includes not only the content such as text and images, but also metadata including a position (x and y coordinates) of a specific content and a font of the specific content.